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Abstract 


A  quite  general  class  of  Optimal  Bounding  Ellipsoid  (OBE)  algorithms  including  all  meth¬ 
ods  published  to  date,  cam  be  unified  into  a  single  framework  cailled  the  Unified  OBE  (UOBE) 
algorithm.  UOBE  is  based  on  generalized  weighted  recursive  leaist  squares  in  which  very  broaid 
claisses  of  “forgetting  factors”  and  data  weights  may  be  employed.  Different  instances  of  UOBE  are 
distiguished  by  their  weighting  policies  and  the  criteria  used  to  determine  their  optimad  values. 

A  study  of  existing  OBE  algorithms,  with  a  particular  interest  in  the  tradeoff  between  algorithm 
performance  interpretability  and  convergence  properties,  is  presented.  Results  suggest  that  am 
intepretable,  converging  UOBE  algorithm  will  be  found.  In  this  context,  a  new  UOBE  technique, 
the  set  membership  stochastic  approximation  (SM-S A)  algorithm  is  introduced.  SM-SA  possesses 
interpretable  optimization  measures  and  known  conditions  under  which  its  estimator  will  converge. 
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1  Introduction 

Set-membership-based  (SM)  system  identification  algorithms  offer  an  interesting  alternative  to 
conventional  techniques.  SM  methods  have  been  receiving  increasing  attention  internationally  as 
is  evident  from  the  collection  of  papers  in  this  volume.  Recent  reviews  of  this  field  are  found,  for 
example,  in  [1]— [3].  This  paper  is  restricted  to  the  class  of  algorithms  known  as  optimal  bounding 
ellipsoid  (OBE)  algorithms  which  follow  from  a  bounded  error  constraint. 

In  this  paper  we  initially  formulate  a  very  broad  class  of  OBE  algorithms,  including  all  methods 
published  to  date,  into  a  general  framework  called  the  Unified  OBE  (UOBE)  algorithm.  We  then 
exploit  the  UOBE  formalism  to  explore  some  interesting  connections  which  exist  among  existing 
OBE  algorithms.  A  particular  concern  will  be  the  pursuit  of  an  OBE  algorithm  which  has  both  well- 
understood  convergence  properties,  and  an  intuitively  meaningful  optimization  criterion.  These  two 
desirable  properties  have  yet  to  be  combined  into  a  single  OBE  algorithm. 

2  A  Unified  OBE  (UOBE)  Algorithm 

2.1  The  Bounded  Error  Problem  and  the  UOBE 

The  bounded  error  identification  problem  is  as  follows:  Assume  that  we  are  observing  some  physical 
system  which  is  generating  sequence  {y(-)}  €  Ck  in  response  to  input  {ti(-)}  €  Cl.  {«(•)}  is  a 
realization  of  an  ergodic,  wide  sense  stationary  stochastic  process.  Both  input  and  output  sequences 
are  measurable.  We  assume  the  existence  of  a  “true”  regression  mode!  of  the  form 

y(n)  =  0?x(n)  +  e.(n)  (1) 

in  which  x(n)  is  an  m-vector  of  known  functions, 

<pi[y{n  -  1), y(n  -2 ),..., y(n-  p),u(n),u(n  -  1 ),..., u(n  -  q )] 

<P2[y(n- l),y(n  -  2),...,y(n  -  p),u(n),u(n  -  l),...,u(n  -  g)] 
x(n)=  ,  (2) 

Vm[y(n  -  l),y(n  -  2),.  ..,y(n  -  p),u(n),u(n  -  l),...,u(n  -  ?)] 

and  where  {e.(  )}  £  Ck  is  a  realization  of  a  zero-mean,  second  moment  ergodic,  complex  vector¬ 
valued  random  sequence  whose  vector  components  are  independent.  The  matrix  &m  £  CmxA 
parameterizes  the  model.  At  time  n  we  wish  to  use  the  observed  data  on  t  £  [l,n]  to  deduce 
an  estimated  model  of  the  same  form.  The  parameter  estimate  is  denoted  by  &(n )  and  the  residual 
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process  by  e(-,0(n)).  The  dependence  of  the  residual  upon  the  parameter  estimates  is  highly 
significant,  so  it  is  shown  explicitly. 

As  the  basis  for  the  unified  algorithm,  we  recall  the  identification  algorithm  variously  known 
as  weighted  recursive  least  squares  (WRLS)  (e.g.  [6], [7]),  weighted  sequential  least  squares  (e.g. 
[8]),  weighted  sequential  regression  (e.g.  [9]),  and  other  names.  We  shall  use  the  name  “WRLS” 
throughout.  The  WRLS  algorithm  is  used  to  sequentially  compute  the  weighted  least  square  error 


estimate, 


0(n)  =  argmin-  £  u>n,T  ||  e(r,r)  ||2,  T  e  Cm*k 

r  1 


where,  in  the  most  general  case,  the  data  weights  wn,r  may  be  time- varying  (dependent  upon  both 
n  and  r)  in  a  simple  way, 

{^n^n— l,r  T  ^  71  1 

(4) 

0n  T  =  U 

where  the  sequences  {an}  and  {/3n}  will  be  specified  below  (for  the  present,  they  should  just  be 
regarded  as  sequences  of  finite  numbers).  In  these  terms,  the  WRLS  relations  are1 


Cal{n)  =  C  \n)/an 
C~\n )  =  C~\n-  l)-/3n 


Cal(n  -  \)x(n)xH{n)Cal{n  -  1) 


<9(n)  = 


°  V  7  ™  l  +  (/W«„)G(n) 

0(n  -  1)  +  (3nC~l(n)x(n)eH(n,0(n  -  1)) 


with  C(0)  =  0  and  where  G(n)  d=  xH(n)C~l(n  -  l)x(n).  From  (4)  we  note  that  the  number 
an  effectively  scales  all  previous  weights  at  time  n  to  (in  conventional  applications  of  WRLS) 
decrease  the  influence  of  the  corresponding  data  on  the  estimates.  Accordingly,  {<*„}  is  often  called 
a  sequence  of  forgetting  factors,  and  in  many  cases  the  sequence  is  taken  to  be  a  constant  which  is 
smaller  than,  but  close  to,  unity.  Either  {an}  or  {/?*},  or  both,  may  be  omitted  (set  to  unity),  but 
we  will  have  use  for  both  sequences  in  this  work.  The  matrix  C(n),  usually  called  the  covariance 
matrix2,  is  by  definition  the  sum  of  the  weighted  outer  products, 

n 

C(n)  =  t vn,Tx(T)xH (r)  =  anC(7i  -  1)  +  0nx(n)xH (n).  (8) 


The  recursions  above  theoretically  provide  an  estimate  0(n)  which  is  equivalent  to  the  solution  of 

'For  completeness,  we  also  note  that  the  WRLS  algorithm  can  be  implemented  in  a  different  form  using  QR- 
decomposition  (e.g.  [9] ) ,  and  the  QR  form  has  been  employed  with  some  advantages  in  some  of  the  SM-based 
algorithms  to  be  discussed  below  (e.g.  [10]  -  [13])-  Because  our  purpose  here  is  to  relate  many  existing  developments, 
we  shall  focus  on  the  more  conventional  approach  represented  by  (6)  and  (7). 

2Though  it  is  more  properly  a  normal  matrix  [6], 
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the  normal  equations  (e.g.  [6]), 


C(n)&(n)  =  Cxy(n) 


(9) 


with 

CXy(n)  =  ^  wn<Tx(r)yH (t).  (10) 

r=l 

It  is  not  widely  appreciated  that  all  reported  OBE  algorithms  can  be  unified  into  a  general 
framework  which  we  shall  call  the  Unified  OBE  (UOBE)  algorithm.  Particular  algorithms  are 
distinguished  by  specifying  the  optimization  strategy  for  determining  the  sequences  of  weights 
{an}  and  {/?„}. 

Let  us  initially  present  the  UOBE  framework,  then  enumerate  the  particular  algorithms.  UOBE 
algorithms  arise  from  a  bounded  error  constraint: 

II  «.(n)  H2<  7n,  (11) 

where  {7n}  is  a  known  positive  sequence.  At  time  n,  a  set  of  system  parameters,  say  fl(n),  can 
be  found  which  are  consistent  with  the  observations  and  this  sequence  of  bounds.  The  exact  set  is 
difficult  to  describe  and  track,  but,  in  conjunction  with  WRLS  processing,  fi(n)  can  be  shown  to 
be  contained  in  a  superset  of  the  form  (e.g.  [3]  -  [5]) 

Cl(n)  =  {©  |  tr{[0  -  0(n)]"^[0  -  0(n)]}  <  l}  (12) 

where  tr{-}  denotes  the  trace  of  a  matrix,  0(n)  is  the  WRLS  parameter  estimate  at  time  n  using 
weights  {u;n,r,  r  G  [l,n]},  C(n)  is  the  weighted  covariance  matrix,  and  k(ti)  is  the  scalar  quantity 

n(n)  =  tr{0//(n)C(n)0(n)}  +  £  wn<T  [7,-  ||  y(r)  ||2]  .  (13) 

T—  1 

Q(n)  is  a  hyperellipsoid  in  'R?mk,  with  its  center  at  0(n).  By  examining  a  single  output  -  say  y,(-), 
the  ith  component  of  y(-)  -  we  see  that  a  common  “ellipsoid  matrix”  C(n)/n(n)  is  shared  by  each 
of  the  individual  outputs,  but  that  each  is  centered  on  a  different  parameter  estimate  represented 
by  column  i  of  ©(•).  We  conclude  therefore  that  under  bounded  error  constraints,  a  hyperellipsoid 
can  be  associated  with  a  W'RLS  recursion  and  conversely. 

The  weights  {tnn-T,  r  G  [1,  n]}  directly  control  the  size,  orientation,  and  location  of  the  ellipsoid 
in  the  parameter  space  at  time  n.  However,  because  of  the  structure  of  the  WRLS  recursions,  in 
moving  from  time  n  -  1  to  n,  we  are  not  free  to  alter  the  set  of  weights  beyond  that  which  can  be 
accomplished  using  the  numbers  an  and  /3n.  This  is  evident  in  the  recursions  (6)  and  (7).  At  most, 
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Editors:  This  figure  placed  here  for  reviewers’  convenience.  Also 
included  on  a  separate  sheet  per  instructions. 

At  time  n, 

1.  In  conjunction  with  the  incoming  data  set  (y(n),x(n)),  find  optimal  values  of  an  and/or  /?„, 
say  a’  and/or  /3*.  Optimality  criteria  are  described  in  the  text; 

2.  If  optimal  positive  (and  sometimes  further  constrained)  values  a*  and/or  f3*  do  not  exist, 
then  discard  the  data  set  (set  /3*  =  0  and  /  or  q*  =  1); 

3.  Update  C(n),  0(n),  and  «(n)  using  (6),  (7),  and  a  recursion  for  «(•)  described  in  Lemma  1. 


Figure  1:  General  steps  of  the  UOBE  algorithm. 


therefore,  we  have  two  free  parameters  with  which  to  control  Cl(n).  All  existing  UOBE  algorithms 
differ  only  in  their  sequences  {a„}  and  {/3„},  and  the  optimization  criterion  used  to  determine 
them.  The  central  objective  of  the  general  UOBE  algorithm  is  to  employ  the  weights  an  and/or 
3n  in  the  context  of  WRLS  estimation  to  sequentially  minimize  the  ellipsoid  size  in  some  sense.  A 
significant  benefit  is  that  often  no  weights  exist  which  can  minimize  the  ellipsoid,  indicating  that 
the  incoming  data  set  is  uninformative  in  the  SM  sense  and  need  not  be  processed. 

All  UOBE  algorithms  adhere  to  the  three  general  steps  displayed  in  Fig.  1.  (For  details  of 
initialization  and  other  nuances  of  the  specific  algorithms,  the  reader  is  referred  to  original  papers 
cited  above  and  below.)  Having  established  the  basic  framework  of  UOBE,  we  next  consider  the 
optimization  process. 

2.2  Optimization 

Within  the  UOBE  framework  in  Fig.  1,  the  different  algorithms  are  distinguished  by  their  sequences 
{a„}  and  {/?„},  in  conjunction  with  the  optimization  criterion  employed  in  selecting  them.  Three 
optimization  criteria  have  been  used.  The  first  two  involve  set  measures  on  the  ellipsoid  Cl(n) 
and  are  clearly  interpretable  with  minimal  explanation,  while  the  third  will  require  some  further 
elaboration.  The  criteria  are: 

Optimization  Criterion  1  Minimize  the  the  determinant  of  the  inverse  ellipsoid  matrix, 

MW}  =  det  {*(n)C"l(n)}  (14) 
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(henceforth  pv(n)  for  simplicity); 


Optimization  Criterion  2  Minimize  the  trace  of  the  inverse  ellipsoid  matrix, 


p*{ft(n)}  =  tr  {/c(n)C  ^n)} 


(15) 


(henceforth  p<(n).);  and, 

Optimization  Criterion  3  Minimize  the  parameter  n(n). 

Criteria  1  and  2  were  first  suggested  by  Fogel  and  Huang  [4],  while  criterion  3  was  used  by  Dasgupta 
and  Huang  [14].  In  these  original  papers,  single  output  (real,  scalar  sequence  y(-))  systems  were 
considered.  In  the  single  output  case  in  which  Q(n)  is  clearly  intepretable  as  an  hyperellipsoid 
ellipsoid  in  Km,  pv(n)  is  proportional  to  the  square  of  the  volume  of  the  ellipsoid,  while  /it(n)  is 
proportional  to  the  sum  of  squares  of  its  semi-axes.  The  same  two  measures  are  meaningful  in  the 
multiple  output  case,  since  they  result  in  the  minimization  of  the  volume  or  trace  of  the  common 
ellipsoid  shared  by  all  the  outputs  (see  discussion  below  (13)).  Of  course,  an  important  feature  of 
any  optimization  criterion  is  that  it  be  readily  intepretable  as  a  desirable  objective.  The  volume 
and  trace  criteria  apparently  have  this  property.  Accordingly,  UOBE  algorithms  following  these 
criteria  will  often  be  referred  to  as  interpretable  algorithms  in  the  following.  Criterion  3,  however, 
has  been  the  subject  of  some  controversy  with  regard  to  its  meaningfulness  and  intepretability,  as 
we  discuss  later  in  the  paper.  This  criterion  has  been  used  in  conjunction  with  a  specific  weighting 
strategy  to  achieve  a  rigorous  proof  of  convergence  in  a  certain  sense.  The  apparent  need  to  trade 
interpretability  of  a  UOBE  algorithm  for  proof  of  convergence  will  be  one  of  the  central  themes  of 
the  remaining  parts  of  the  paper. 

Having  established  the  optimization  criteria,  let  us  now  focus  on  the  weight  sequences.  For  any 
of  the  criteria  above,  there  is  only  one  quantity  to  be  optimized  at  time  n  which  in  each  case  is 
dependent  upon  both  an  and  (3n.  However,  the  numbers  an  and  (3n  are  essentially  independent  of 
one  another,  so  that  any  attempt  to  optimize  one  of  the  criterion  measures  with  respect  to  both 
a„  and  (3n  results  in  an  infinity  of  solutions  which  is  resolved  by  arbitrarily  choosing  a  value  of 
either  weight.  Accordingly,  we  may  either  tie  the  weights  together  through  some  functional  relation, 
optimize  over  only  one  weight  and  choose  the  other  according  to  some  predetermined  purpose,  or 
simply  eliminate  the  “unused”  weight  altogether  by  setting  it  to  unity  (a  special  case  of  the  second 
strategy).  We  shall  adopt  the  policy  of  writing  the  weights  an  and  (3n  as  functions  of  a  single 
parameter  to  be  optimized  at  time  n,  say  An,  so  that  (in  conventionally  abusive  notation) 

Ctn  —  &n(^n)  (1®) 
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0n(An) 


(17) 


Pn  = 

where  {an}  and  {/3n}  should  now  be  considered  to  be  sequences  of  functions  whose  properties  will 
be  specified  later.  So  that  we  have  sufficient  generality  for  our  purposes,  it  is  important  to  note 
that  these  functions  need  not  depend  on  An.  For  example,  {an}  may  be  chosen  independently  of 
the  optimization  in  which  case,  at  time  n, 

<*n(A„)  =  «n,  a.  constant  (function)  independent  of  An.  (18) 

In  this  special  case,  it  is  true  that 

— an(An)  =  0.  (19) 

We  shall  refer  to  A„  as  a  “weight”  at  time  n,  since  if  an(An)  =  1  and  /3„(An)  =  An,  then  An  is 
simply  the  weight  associated  with  the  standard  WRLS  recursion  with  no  forgetting  factor.  This 
general  setup  will  allow  us  to  embrace  all  UOBE  algorithms  in  a  single  theoretical  framework. 

We  now  turn  to  the  problem  of  optimization  of  the  identification  at  time  n  according  to  the 
criteria  stated  above.  The  following  results  generalize  and  unify  all  optimization  procedures  found 
in  the  literature. 


Theorem  1  Each  of  the  functions  in  the  sequences  {an(An)}  and  {/?„(An)}  are  assumed  positive 
and  are  chosen  such  that,  for  each  n,  qn( An)  =  /3n(\n)/a„{ A„)  is  a  continuous,  one-to-one  mapping 

qn  '•  (0,  an) — +  (0,  oo )  (20) 

where  an  >  0.  Then,  if  it  exists,  a  weight  A*  which  minimizes 

1.  the  volume  measure  pv(n)  is  the  unique  positive  root  in  Xn  of  the  equation  Fv(qn( An))  =  0  on 
the  interval  (0,  an),  where, 

Fv(s)  =  a2s2  +  ais  +  a0  (21) 

with  a2  =  {(mfc  -  l)7nG2(n)}, 

ai  =  {(2mk  -  l)7n+  ||  e(n,0(n  -  1))  ||2  -«(n  -  l)G(n)}  G(n), 
a0  =  mk  [7„-  ||  e(n,0(n  -  1))  ||2]  -  n(n  -  1  )G(n); 

2.  the  trace  measure  pt(n)  is  the  unique  positive  root  in  An  of  the  equation  Ft(qn(Xn))  =  0  on 
the  interval  (0,a„),  where 

Ft(s)  =  b3s3  +  bis2  +  b\s  +  b0  (22) 

with  b3  =  7 (n)G2(n)(G(n)  -  /(n  -  l)H(n))  , 
b2  =  37(n)G(n)[G(n)  -  I(n  -  l)H(n)], 

6,  =  H(n)G(n)I(n- l)K(n  -  1)  -  2H(n)I(n  -  1)  [7(n)~  ||  e(n,0(n-  1))  ||2] 

- G(n )  ||  e(n,0(n  -  1))  ||2  +37(n)G(n), 

=  7 (**)-  II  e(n,0(n  -  1))  ||2  -H(n)I(n  -  l)«(n  -  1), 
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where  H(n)  =  xT(n)C  2(n  —  2)x(n),  and  I(n)  =*  tr  C(n). 


Before  sketching  the  proof  of  this  important  result,  we  remark  that  it  is  unnecessary  to  actually 
solve  for  the  root  of  Fv(qn(K))  =  0  or  Ft(qn( An))  =  0  to  determine  whether  a  positive  root  exists. 
Based  on  generalizations  of  previous  work  [3]  -  [5],  it  can  be  shown  that  necessary  and  sufficient  tests 
for  the  existence  of  the  positive  root  are  ao  <  0  and  bo  <  0  in  the  volume  and  trace  minimization 
cases,  respectively.  This  fact  will  play  an  important  role  in  future  discussions. 

Sketch  of  Proof:  Write  the  (optimal)  normal  equations  (9)  at  time  n  in  the  form 

[a„(A;)C(n  -  1)  +  /*»( A;)a:(n)*H(n)]  0(n)  =  an(\mn)Cxy(n  -  1)  +  f3n(K)x(n)yH{n).  (23) 

Now  divide  through  by  an(A*)  to  yield 

[C(n  -  1)  +  g„(A;)x(n)a://(n)]  0(n)  =  Cxy(n  -  1)  +  qn(X^)x(n)yH  (n)  (24) 


where  qn(-)  is  defined  in  the  theorem.  This  shows  that  an  identical  estimate  (and  concomitant 
optimization  problem)  results  if  the  covariance  matrix  is  unweighted  and  the  outer  product  is 
weighted  by  <?„(A*).  In  principle,  then,  to  obtain  the  desired  estimate,  0(n),  the  dependence  of 
qn{  A* )  upon  A*  is  superfluous  and  we  can  optimize  over,  say,  pn  =  gn(An),  ignoring  the  dependence 
upon  An.  For  this  simple  case,  it  has  been  proven  in  [3], [5]  that 

=  K(pn)Fv(pn)  (25) 


where  K(pn)  >  0  for  all  pn  >  0,  and  where  Fv(pn)  is  as  defined  in  (21).  Moreover,  Fv(pn)  has  at 
most  one  positive  root,  say  p* ,  which,  if  it  exists,  corresponds  to  a  minimum  of  pv(n)  since 


d2pv 


dpi 


=  K(p‘n) 


Pn 


dFv 

dpn 


+ 


Pn 


dK 

dpn 


dF 

Fv ( Pn )  =  K(p'n) 

r-n  dP» 


>  0. 


(26) 


Pn 


Now  return  to  the  case  in  which  {pn}  represents  a  sequence  functions  of  parameter  An,  {vn(An)}- 
It  follows  immediately  from  (25)  that,  at  time  n, 


dpv 

dK 


K(qn(Xn))FMK)) 


dqn 

dK' 


(27) 


Because  of  the  assumed  monotonicity  of  <?n(An),  the  only  zeros  of  the  derivative  occur  when 
F„(<7n(An))  =  0.  Since  there  is  a  unique  root  in  pn,  viz.  p*,  and  since  gn(-)  is  an  invertible 
function,  there  is  a  unique  root  in  An,  viz.  A*  =  gj,nv(p|i),  where  q'nV(-)  is  the  inverse  mapping  of 
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Similar  analysis  for  the  trace  case  follows  from  the  work  in  [3]  in  which  an(An)  is  taken  to  be 
independent  of  An.  □ 

Theorem  1  does  not  embrace  optimization  by  minimization  of  «(n).  We  shall  find  this  criterion 
to  be  problematic  from  several  points  of  view.  In  the  present  situation,  the  fact  that  «(n)  cannot 
be  expressed  as  a  function  of  qn(A„)  alone  precludes  the  derivation  of  a  general  result  like  those  in 
Theorem  1.  However,  we  provide  a  result  which  will  be  useful  in  future  discussions: 

Theorem  2  Consider  the  optimization  problem  posed  above.  If  it  exists,  the  optimal  weight  A* 
which  minimizes  n(n)  is  a  root  of  the  equation 

Fk(s)  =  {(an(s)  +  pn(s)G(n))2K(n  -  1)  -  02n(s)G(n)  ||  e(n,  6>(n  -  1))  ||2}  al(s) 

+  {(an(s)  +  0n(s)G(n))2 ln  -  a2n(s)  ||  e(n,  0(n  -  1))  ||2}  &'n(s).  (28) 

where  a'n  and  f)'n  indicate  derivatives. 

This  inelegant  result  will  simplify  to  useful  quadratics  in  two  special  cases  in  the  paper.  It  is  proved 
by  taking  the  total  derivative  with  respect  to  A„  of  the  recursive  expression  for  «(n)  found  in  the 
following  lemma.  This  lemma  has  been  proven  for  the  case  an(An)  =  1,  /9„(A„)  =  An  in  [5]  and  for 
an(An)  =  an  (independent  of  A„),  /3„(An)  =  An  in  [3].  The  generalization  given  here  follows  from 
similar  analysis. 


Lemma  1  The  sequence  n(n)  can  be  computed  recursively  using 

«n(A;)/MA;)  ||  e(n,0(n-  1))  ||2 


«(n)  =  an(A;)/t(n  -  1)  +  /3n{Xmn)jn  - 


an(A;)  +  /?n(A;)G-(n) 


(29) 


with  oq(A5)k(0)  =  0. 


In  conjunction  with  these  general  optimization  results,  we  present  the  following  corollary  which 
asserts  some  remarkable  facts  about  the  quantities  upon  which  the  various  UOBE  algorithms  are 
based.  Once  again,  we  see  n(n)  to  have  exceptional  behavior  relative  to  the  more  interpretable 
criteria: 


Corollary  1  Consider  a  UOBE  algorithm  in  which  volume  or  trace  is  to  be  minimized.  Let  qn{ A„) 
be  as  described  in  Theorem  1  for  each  n.  Then,  the  following  are  independent  of  the  choices  of 
function  sequences  (an(An)}  and  {/3„(A„)}: 

/.  the  sequence  of  measures  of  optimality  ({/xv(n)}  or  {Mt(n)}A‘ 
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2.  the  data  points  selected  (times  for  which  there  exists  A*  >  0 ); 

3.  the  parameter  matrix  estimate,  &(n). 

However,  in  a  UOBE  algorithm  with  k  minimization,  none  of  these  items  is  independent  of  the 
sequences  {an(An)}  and  {/?„(An)}. 


Sketch  of  Proof:  Consider  first  the  volume  and  trace  cases.  The  independence  of  0(n)  follows 
from  the  fact  that  there  is  a  unique  root  (if  any),  p*  =  <jn(A*),  °f  either  (21)  or  (22),  which  does 
not  depend  on  functions  an  or  /?„.  Therefore,  &(n)  is  given  by  (24)  regardless  if  the  choices  of  an 
and  (3n .  However,  consider  C(n )  for  a  particular  choice  of  an(An)  as  written  in  the  brackets  on  the 
left  side  of  (23).  It  follows  that 

=  C(n  -  1)  +  qn(K)x(n)xH(n).  (30) 

«n(An) 

Since  the  number  qn( A*)  does  not  depend  on  the  choice  of  an,  the  right  side  is  invariant  with  an. 
C(n)  must  vary  with  choice  of  an  on  the  left  to  maintain  the  equality.  A  similar  analysis  pertains 
to  the  ratio  C(n)/(3n( A*).  Also,  from  (29) 


«(n) 

an(An) 


=  «(n  -  1)  +  <7n(A;)7n  -  <?n(A;) 


II  e(n,0(n-  1))  ||2 

1  +  9n(A*)G(n) 


(31) 


and  a  similar  argument  applies  to  show  that  k(u)  depends  on  an.  The  ratio  k(ti)//?„(A*)  is  formed 
to  show  dependence  of  k(u)  upon  /?„.  On  the  other  hand,  consider  the  ellipsoid  matrix  C(n)/«(n). 
Dividing  numerator  and  denominator  by  a„(A*)  yields 


C(n) 

*(n) 


C(n)/ an(A* ) 
K(n)/a„(A;) 


C(n  -  1)  -f  qn( \„)x(n)xH (n) 

K(n  -  1)  +  qn(X’n)-in  ~  ^(A;)11^"’®^"^1 


(32) 


which  reveals  that  C(n)/«(n),  and  hence  p„(n)  and  p<(n),  depend  only  on  p*  =  qn(AJ[)  and  not  on 
particular  choices  of  a„  and  /?„.  By  similar  means,  it  can  be  argued  that  the  quantities  G(n)«(n- 1) 
and  H(n)I(n)K(n  -  1),  and,  hence,  oq  and  bQ  of  Theorem  1,  are  independent  of  a„  and  0n.  By 
the  remarks  under  Theorem  1,  it  is  therefore  seen  that  the  selection  of  points  does  not  depend  on 
{on( An)}  nor  {/?n(An)}. 

Now  consider  the  k  minimization  policy.  We  provide  a  counterexample  to  the  claim  that  the 
minimum  value  of  *(n),  the  estimate  &(n),  and  the  selection  or  rejection  of  data,  are  all  independent 
of  the  choice  of  function  sequences.  At  time  n,  for  a  given  C(n  —  1),  k(ti  -  1),  and  e(n,Q(n  —  1)), 
suppose  that  ||  e(n.G(n  -  1))  ||2>  7„  -  n(n  -  1),  but  ||  e(n,0(n  -  1))  ||2<  -yn.  We  shall  show 
later  in  the  paper  that  if  an(An)  =  1  and  Jn(An)  =  A„,  then  A*  >  0  does  not  exist;  whereas  if 
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an(An)  =  1  -  An  and  /?n(An)  =  An,  then  0  <  A*  <  1  may  exist3.  Therefore,  under  the  first  choice  of 
functions,  the  point  will  certainly  be  rejected;  whereas  with  the  second,  it  may  be  accepted.  The 
resulting  estimate  ©(n),  and  value  «(n),  can  therefore  be  different  under  the  different  choices  of 
an  and  /?„.  □ 

We  now  turn  to  the  consideration  of  specific  algorithms  which  have  been  used  in  practice.  In 
addition  to  showing  that  these  methods  are  quickly  unified  under  the  UOBE  framework,  one  of 
the  main  themes  will  be  to  explore  the  apparent  tradeoff  between  interpretability  and  convergence 
which  seems  to  exist  in  the  currently  employed  methods.  The  UOBE  paradigm  will  contribute  the 
understanding  of  this  relationship. 

3  The  “Landmark”  OBE  Algorithms 

It  is  the  purpose  of  this  section  to  enumerate  instances  of  the  UOBE  algorithm  which  have  been 
used  in  practice.  These  algorithms  have  each  arisen  for  a  different  reason  and  the  unification  of 
existing  methods  has  not  been  appreciated  nor  explored  because  their  original  developments  seem 
somewhat  disparate.  However,  in  light  of  the  UOBE  framework,  it  is  natural  to  inquire  to  what 
extent  the  various  algorithms  are  truly  serving  distinctly  different  purposes.  This  inquiry  is  the 
subject  of  the  next  section  of  the  paper. 

We  shall  make  no  attempt  to  formally  reconstruct  original  developments.  Rather,  in  this 
section  we  simply  distinguish  the  methods  by  specification  of  their  sequences  {a„(A„)},  {/3n(A„)}, 
and  optimization  criteria.  The  reader  is  referred  to  the  original  papers  for  a  clearer  understanding 
of  the  history  and  motivations  for  the  different  algorithms. 

Three  principle  OBE  algorithms  have  been  studied  extensively.  These  are  the  Fogel-Huang  OBE 
(F-H/OBE)  algorithm  [4]  (originally  called  simply  “OBE”),  the  set-membership  weighted  recursive 
least  squares  (SM-WRLS)  algorithm  [10], [5],  and  the  Dasgupta- Huang  OBE  (D-H/OBE)  algorithm 
[14].  Their  differences  in  terms  of  the  UOBE  framework  are  shown  in  Table  1.  To  these  three  basic 
versions,  we  have  added  a  fourth  algorithm  which  has  been  developed  recently  by  the  authors, 
the  set-membership  stochastic  approximation  (SM-SA)  algorithm  [15].  In  the  ensuing  discussion, 
SM-SA  will  be  found  to  be  related  to  its  predecessors  in  some  interesting  ways.  We  have  also  noted 
a  heretofore  unpublished  variation  on  SM-WRLS,  Dual  SM-WRLS,  which  will  be  found  to  exhibit 
some  useful  numerical  properties. 

We  should  also  remark  that  our  focus  in  this  paper  is  principally  upon  the  identification  of 
time- invariant  systems  in  which  the  components  of  the  disturbance  vectors  e„(n)  are  independent 

'Because  of  the  weighting  strategy.  A*  must  be  constrained  in  this  case  to  the  interval  (0,  1). 
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Editors:  This  table  placed  here  for  reviewers’  convenience.  Also 
included  on  a  separate  sheet  per  instructions. 


Table  1:  Specification  of  Existing  UOBE  Algorithms 


Algorithm 

«n(A;) 

0n(K) 

Optimization 

F-H/OBE 

1  /k(ti  -  1) 

An/7n 

/i„(n)  or  pt(n) 

SM-WRLS 

1 

A« 

pv{n)  or  pt(n) 

Dual  SM-WRLS 

a; 

1 

pv(n)  or  pt(n) 

D-H/OBE 

i- a; 

A; 

k(ti) 

SM-SA 

mmmgam 

pv(n)  or  pt(n) 

and  each  orthogonal  sequences.  Explicitly  adaptive  UOBE  algorithms  have  been  discussed  in  [16] 
-  [19].  A  discussion  of  colored  noise  issues  for  a  D-H/OBE-like  algorithm  is  found  in  [20],  and  for 
a  more  general  class  of  algorithms  in  [21], [22]. 

4  Discussion  and  Comparative  Analysis  of  Existing  UOBE  Al¬ 
gorithms 

4.1  Volume  and  Trace  Minimizing  Algorithms 

F-H/OBE  and  SM-WRLS.  F-H/OBE  represents  the  first  major  journal  paper  on  the  applica¬ 
tion  of  ellipsoid  algorithms  to  parametric  LP  models.  The  entirely  nonintuitive  sequences  {a„(An)} 
and  {/?„(A„)}  used  in  F-H/OBE  are  the  consequence  of  the  algorithmic  approach  taken  rather  than 
deliberate  choices  of  the  functions  (see  [4]).  In  fact,  F-H/OBE  was  developed  using  a  geometric  ap¬ 
proach  which  attempts  to  optimally  bound  with  a  new  hyperellipsoid,  D(n),  the  intersection  of  the 
existing  ellipsoid,  Q(n  —  1),  and  the  feasible  set  implied  by  the  incoming  data  set.  It  is  interesting 
to  note  that,  because  the  weighting  sequence  {a„(An)}  is  equivalent  to  the  sequence  {/t_1(n)}  in 
F-H/OBE,  the  ellipsoid  matrix  at  time  n,  C(n)/ n(n),  is  identical  to  the  scaled  covariance  matrix 
Ca(n)  =  anC(n)  whose  inverse  is  computed  directly  in  the  course  of  the  recursion  (6). 

As  an  aside,  we  note  that  the  volume  optimization  version  of  F-H/OBE  is  “suboptimal”  in  the 
sense  that  it  may  sometimes  result  in  ellipsoids  which  are  optimal  in  the  prescribed  sense,  but  are 
are  unnecessarily  large  according  to  certain  simple  arguments.  Belforte  and  Bona  have  suggested 
a  remedy  in  [23]  (see  also  [24]).  As  pointed  out  by  Walter  and  Piet-Lahanier  [1],  the  modified  pro¬ 
cedure  is  equivalent  to  the  ellipsoid  with  parallel  cuts  algorithm  developed  by  researchers  working 
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in  linear  programming. 

Even  though  Fogel  and  Huang  clearly  state  in  their  1982  paper  that  there  is  a  LSE  problem 
underlying  F-H/OBE,  the  geometric  approach  tends  to  draw  attention  away  from  its  presence.  The 
approach,  notwithstanding,  however,  the  similarity  of  the  F-H/OBE  equations  to  “nonadaptive” 
WRLS  (i.e.,  without  the  {a„}  sequence)  is  striking,  and  it  has  not  gone  unnoticed  in  the  literature 
[l],[10],(16].  The  paper  by  Norton  and  Mo  [16]  which  treats  adaptive  OBE  processing  uses  the 
WRLS  framework  and  implicitly  suggests  the  basis  for  the  UOBE  approach  taken  here.  The  key  to 
recognizing  the  potential  for  an  unlimited  variety  of  UOBE  algorithms  under  the  WRLS  umbrella, 
is  the  recognition  of  Fogel  and  Huang’s  «-1(n)  parameter  as  an  unusual  “forgetting  factor”  an. 
Until  recently,  however,  this  uniformity  of  ellipsoid  algorithms  was  not  fully  appreciated.  In  the 
early  and  mid  1980’s,  Deller  and  students  (early  papers  cited,  e.g.,  in  [5])  recognized  the  similarity 
of  F-H/OBE  to  RLS,  and  attempted  to  associate  an  ellipsoid  directly  with  WRLS  rather  than 
conversely.  The  result  is  SM-WRLS,  which  is  so-named  to  emphasize  the  nature  of  the  approach. 

While  developed  very  differently,  it  can  be  appreciated  that  F-H/OBE  and  SM-WRLS  are  very 
similar,  the  most  significant  difference  being  the  choice  of  the  “unoptimized”  sequence  {an(An)}. 
Indeed,  since  we  know  that  the  parameter  estimates  and  minimization  measures  will  be  identical 
in  the  two  cases,  there  is  little  in  the  way  of  theoretical  consideration  to  commend  one  over  the 
other.  No  practical  considerations  are  known  which  indicate  a  preference.  It  is  true  that  modify¬ 
ing  the  {a„(An)}  and  {/3n(A„)}  sequences  used  in  F-H/OBE  will  destroy  the  original  geometrical 
interpretation  of  the  algorithm.  However,  it  is  not  clear  that  this  interpretation  has  any  practical 
significance.  With  this  disclaimer,  we  note  that  the  extensions  of  SM-WRLS  discussed  below  apply 
in  similar  ways  to  F-H/OBE  and  other  algorithms  in  this  class. 

The  Dual  SM-WRLS  algorithm  [22]  has  arisen  out  of  an  important  practical  consideration.  It 
is  apparent  from  (8)  that,  if  a„  =  an(An)  tends  to  be  not  less  than  unity,  {/ 3n  —  /3n(An)}  must  be  a 
generally  increasing  sequence  for  incoming  data  to  have  any  impact  on  the  estimate,  particularly  as 
n  becomes  large.  For  SM-WRLS,  this  fact  frequently  leads  to  huge  numbers  in  the  computations 
and  the  potential  for  numerical  instabilities.  This  problem  does  not  occur  when  the  sequence 
{ an  =  a„(An)}  is  optimized.  In  fact,  in  unpublished  simulation  studies  (with  the  volume  criterion) 
we  have  found  the  weight  sequence  to  remain  nicely  bounded.  Interestingly,  but  not  unexpectedly 
(Corollary  1),  if  SM-WRLS  is  run  on  the  same  data,  first  with  the  /?„  weights  optimized,  then 
with  the  an  weights  optimized,  identical  data  are  selected  in  each  case  by  the  set- membership 
considerations,  and  identical  estimates  result  from  the  two  approaches. 

F-H/OBE  and  SM-WRLS  have  been  successfully  applied  to  the  identification  of  simulated  and 
real  systems  (e.g.  [3], [11], [4], [5]).  The  recent  discovery  of  the  “dual”  optimization  concept  promises 


12 


to  add  a  new  meritorious  feature  to  these  algorithms,  improved  numerical  stability.  The  central 
benefit  of  the  general  classes  of  algorithms  represented  by  these  methods  is  the  meaningfulness  of 
the  optimization  process.  The  main  deficiency  of  these  groups  of  algorithms  has  been  their  lack 
of  well- understood  convergence  properties.  This  problem  led  to  the  development  of  the  D-H/OBE 
algorithm  to  which  we  turn  below.  Before  doing  so,  however,  we  address  the  issue  of  whether 
volume  and  trace  algorithms  converge.  An  affirmative  answer  to  this  question  is  most  desirable 
because  it  would  combine  the  desired  features  of  interpretability  and  convergence. 

Convergence  of  Volume  and  Trace  Algorithms.  While  our  immediate  discussion  is  focusing 
on  existing  UOBE  algorithms,  the  work  in  earlier  sections  of  this  paper  renders  the  following 
applicable  to  virtually  any  algorithm  which  minimizes  volume  or  trace.  We  restrict  our  attention 
to  the  case  in  which  the  components  of  the  disturbance  vectors  c,(n)  are  independent  and  each 
orthogonal  sequences.  A  discussion  of  colored  noise  issues  is  found  in  [21], [22]. 

One  of  the  alluring  aspects  of  having  interpreted  the  general  UOBE  algorithm  as  a  WRLS 
algorithm  with  a  bounded  error  “overlay”  is  that  the  convergence  properties  of  the  estimate  resulting 
from  the  basic  RLS  algorithm  (an(An)  =  /3n(A„)  =  1  for  all  n)  are  well-known.  In  the  RLS  case, 
if  the  sequence  {«.(■)}  is  wide-sense  stationary,  second  moment  ergodic  almost  surely  (a.s.),  white 
noise,  then  the  RLS  estimator  &(•)  will  converge  asymptotically  to  &,  a.s.  (e.g.  [6], [8]).  However, 
this  well-known  convergence  result  falls  far  short  of  a  convergence  proof  for  the  UOBE  algorithms 
under  consideration  which  use  vastly  different  data  weighting  strategies.  A  simple  inclusion  of  the 
sequence  {an(A„)  =  a}  with  0  <  a  <  1  (with  {/3„(An)  =  1}),  for  example,  has  been  shown  to  lead 
to  inconsistent  asymptotic  estimates  [25].  Likewise,  we  may  even  assert  a.s.  convergence  of  the 
RLS  estimate,  albeit  to  a  bias,  when  {e.()}  is  colored  and  persistently  exciting  [26].  Again,  while 
this  result  does  not  provide  proof  of  estimator  convergence  for  the  present  UOBE  cases,  the  UOBE 
estimate  has  been  found  to  practically  converge,  expectedly  to  a  bias  [21]. 

More  generally,  it  would  be  interesting  to  have  a  precise  understanding  of  the  asymptotic  be¬ 
havior  of  the  hyperellipsoidal  feasible  set,  especially  in  the  case  of  colored  noise.  Unfortunately, 
convergence  proofs  for  the  volume  and  trace  minimization  algorithms  are  not  known.  The  original 
OBE  paper  by  Fogel  and  Huang  [4]  is  sometimes  misunderstood  to  indicate  the  convergence  of  the 
bounding  ellipsoid  to  a  point  under  ordinary  conditions  on  {«.(•)}.  In  fact,  the  F-H  paper  only 
proves  this  convergence  for  ordinary  RLS  so  that  the  fundamental  optimization  process  is  not  taken 
into  account. 

Whereas  no  known  convergence  proof  for  either  the  estimator  or  the  feasible  set  exists  for 
any  volume  or  trace  algorithm,  a  recent  result  indicates  some  theoretical  support  for  the  favorable 
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convergence  behavior  of  these  methods  which  is  observed  in  practice.  The  following  has  been  proven 
for  a  specific  UOBE  class  (an(A“)  sequence  chosen  arbitrarily,  /3n(A„)  =  An  optimized)  in  [21].  By 
arguments  in  this  paper,  the  more  general  result  follows: 


Theorem  3  For  any  UOBE  algorithm  in  which  pv  is  minimized,  if  there  exists  A*  >  0,  then  there 
also  exists  a  large  neighborhood  of  weights  around  A*,  say  (including,  e.g.,  all  A„  such  that 
0  <  An  <  A*  >,  such  that  if  An  6  Af\*  is  used,  /x„(n)  <  fiv(n  -  1). 

Though  it  has  not  been  formally  proven,  it  is  likely  that  a  similar  result  pertains  to  trace  algorithms 
as  well. 

Theorem  3  indicates  that  the  ellipsoid  volume  will  tend  to  some  unspecified  size  in  some  un¬ 
specified  manner.  If  we  consider  the  ratio 


z/(n)  = 


f*(") 

H(n  -  1) 


(33) 


(where  n(n)  means  either  /iv(n)  or  /it(n)),  for  example,  the  rate  at  which  t'(n)  approaches  unity 
will  determine  the  convergence  behavior  of  the  ellipsoid.  Suppose,  for  example,  that 


u(n)  ~ 


(34) 


In  this  case 

K™  rt")  =  n‘™o  n  ?  0.  (35) 

T  =  1 

On  the  other  hand,  if 

/x  ,  1  n  -  1 

v(n)  ~  1 - = - ,  (36) 

n  n 

then, 

n  r  —  1  1 

lim  /i(n)  =  /x(  1)  lim  TT - =  /i(l)  lim  —  =  0.  (37) 

n-*oo  n— *oo  7-  n-*oo  n 

This  result  has  not  been  clearly  understood,  and  its  finding  offers  some  hope  that  a  proof  of 
convergence  (in  some  sense)  for  the  volume  and  trace  algorithms  may  be  found  in  the  white  noise 
case. 

It  has  frequently  been  noted  that  the  hyperellipsoidal  bounding  sets  resulting  from  UOBE 
algorithms  can  be  quite  “loose”  supersets  of  the  exact  feasibility  sets  (polytopes)  (e.g.  [27], [28]), 
particularly  in  “finite”  time4.  However,  many  simulation  studies  in  the  literature  (white  noise  case) 
have  shown  the  volume  of  the  ellipsoids  to  become  quite  small  in  the  “long  term.”  Further,  as 

4  Norton  has  proposed  the  use  of  inner  bounds  as  a  possible  remedy  for  this  problem  [28], [29] 
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we  and  other  researchers  have  demonstrated,  the  empirical  convergence  and  tracking  properties 
of  the  UOBE  estimator  are  favorable  in  spite  of  the  few  data  used.  This  is  an  indication  that 
the  presence  of  the  ellipsoid  and  the  optimization  procedure  centered  on  it,  are  quite  useful  for 
parameter  identification,  regardless  of  our  present  inability  to  completely  understand  its  behavior 
in  theory.  Theorem  3  offers  further  support  for  “good  behavior”  of  this  class  of  algorithms. 

SM-SA.  The  SM-SA  algorithm  provides  an  interesting  “bridge”  between  the  present  discussion 
and  that  of  D-H/OBE  to  follow.  SM-SA  represents  the  authors’  pursuit  of  a  converging,  inter¬ 
pretable  UOBE  algorithm.  The  algorithm  is  so-named  because  it  is  equivalent  in  form  to  the 
so-called  stochastic  approximation  (SA)  algorithm  (e.g.  [30])  as  we  discuss  below.  Our  work  has 
shown  that  a  feature  which  tends  to  promote  convergence  of  the  ellipsoid  is  the  prevention  of 
“drifting”  of  the  covariance  matrix  toward  infinity.  In  an  unweighted  RLS  algorithm,  this  problem 
is  eliminated  by  normalizing  the  covariance  matrix  to  the  time  n,  that  is,  by  replacing  C(n)  by 
(l/n)C(n).  In  principle,  if  the  sequence  (x(-)}  represents  a  stationary  stochastic  process  with 
appropriate  ergodicity  properties,  then  (1  /n)C(n)  will  tend  to  S  |x(n)*w(n)|.  Clearly,  however, 
this  strategy  may  not  work  for  weights  determined  by  SM  considerations  as  C(n)  may  grow  much 
faster  than  n.  In  the  SM-SA  approach,  SM-WRLS  with  either  volume  or  trace  minimization  is 
modified  so  that  covariance  matrix  is  normalized  to  the  sum  of  the  weights,  say  A*  =  £)”=1  A*. 
Accordingly, 

C(n)  =  %iC(n  -  1)  +  &x(n)*H(n).  (38) 

Since  A*  =  A*_t  +  A*,  we  find  that 


■  au;;'a„ 

(39) 

^  ■  a;_*"+v 

(40) 

While  the  ellipsoid  associated  with  the  SM-SA  algorithm  has  not  been  proven  to  converge,  the 
method  is,  of  course,  subject  to  the  volume  (and  trace)  contraction  rule  specified  by  Theorem  3. 
However,  unlike  the  F-H/OBE  and  SM-WRLS  algorithms,  conditions  under  which  the  estimator, 
0(n),  converges  to  ©.  can  be  clearly  stated  in  this  case5. 


Theorem  4  Sufficient  conditions  for  convergence  of  the  SM-SA  estimator  in  the  sense 

lim  0(n)  0.  (41) 

n— *oo 

5We  shall  find  that  these  conditions  also  apply  to  D-H/OBE. 
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are 


0n(K)  >  0 

X>(a;)  =  oo 

T=1 

(42) 

OO 

Z#(K)  <  oo. 

r  =  l 

(43) 

Sketch  of  Proof:  The  function  sequences  (an(An)}  and  {/3n(An)}  can  be  replaced  by  the  following, 
and  the  optimization  done  over  An,  without  affecting  the  optimization: 

6n(An)  =  1  -  An  (44) 

0n(K)  =  An.  (45) 

This  follows  immediately  from  the  fact  that  the  ratios  gn(A*)  =  /?n(A*)/an(A*)  and  qn(A*)  = 

4„(A*)/dn(A*)  must  be  equal  (to,  say,  p*)  according  to  Theorem  1,  from  which  0n(X *)  =  /?n(A*)  — 
Pn/(  1  +  pj[)  and  an(A*)  =  dn(A*)  =  1/(1  +  p*).  The  WRLS  algorithm  with  the  weighting  strategy 
given  by  (44)  and  (45),  is  frequently  referred  to  as  the  SA  algorithm  for  identifying  linear  parametric 
models.  The  work  of  Robbins  and  Monroe  [31]  and  Blum  [32]  on  the  SA  algorithm  results  in  the 
sufficient  conditions  of  the  theorem.  □ 

Let  us  henceforth  adopt  the  simpler  weighting  strategy  (44)  and  (45)  for  SM-SA.  Remarkably, 
the  weighting  strategy  that  emerges  here  is  identical  to  the  Dasgupta-Huang  weighting,  resulting  in 
the  same  convex  combination  of  past  covariance  matrix  and  incoming  outer  product.  However,  this 
weighting  strategy  arises  in  a  very  different  context  in  which  the  objective  is  to  minimize  the  volume 
or  trace  of  the  hyperellipsoid  at  each  step  —  if  such  can  still  be  accomplished.  Indeed,  there  is  an 
ellipsoid  associated  with  D-H  OBE  at  each  step,  but  the  “usual”  measures  of  its  size  are  ignored  in 
the  optimization  process  -  sacrificing  interpretability.  On  the  other  hand,  SM-SA  does  not  inherent 
the  convergence  properties  of  D-H/OBE  (described  below).  Unlike  previous  intepretable  UOBE 
algorithms,  however,  SM-SA  does  have  known  conditions  for  estimator  convergence.  Further,  SM- 
SA  also  exhibits  the  desirable  property  of  “covariance  boundedness”  which  we  conjecture  will  be 
required  for  a  volume  or  trace  algorithm  to  converge  in  the  set  theoretic  sense  of  D-H/OBE. 

4.2  D-H/OBE  and  the  Issue  of  k  Minimization 

In  the  work  above,  we  have  discussed  the  fact  that  volume  and  trace  UOBE  algorithms  are  in¬ 
tepretable  with  respect  to  their  principles  of  operation,  but  lacking  in  well  understood  convergence 
properties.  Some  significant  progress  on  the  convergence  issue  is  cited,  and  it  seems  likely  that 
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the  elusive  convergence  proof  for  at  least  some  classes  of  volume  and  trace  algorithms  will  be  dis¬ 
covered.  In  this  section,  we  briefly  examine  the  problem  from  the  “other  direction.”  Given  the 
D-H/OBE  algorithm  with  its  desirable  convergence  proof,  can  it  be  shown  that  this  algorithm  is 
actually  performing  according  to  “interpretable”  principles? 

Of  the  fundamental  variations  on  UOBE,  D-H/OBE  is  the  most  recent  to  be  published.  The 
technique  is  unlike  all  other  existing  methods  in  the  use  of  k  minimization.  This  minimization 
approach,  in  conjunction  with  weighting  strategy  (44)  and  (45),  provides  the  means  with  which 
to  prove  asymptotic  and  exponential  convergence  of  the  ellipsoid,  and  cessation  of  updating,  using 
Lyaponov  theory.  From  an  analytical  point  of  view,  the  reason  for  the  choice  of  the  k  optimization 
criterion  is  that  «(n)  is  a  bound  on  the  Lyapunov  function  used  in  the  minimization  at  time  n, 
and  the  convergence  of  the  Lyapunov  function  is  used  to  prove  convergence  of  the  algorithm.  Upon 
convergence,  the  residuals,  «(•,©(•))  are  guaranteed  to  remain  in  the  “dead  zone”  indicated  by  the 
error  bounds,  i.e.,  as  r  — ►  oo,  ||  e(r, G(r  -  1))  ||2<  7t. 

From  an  interpretive  point  of  view,  however,  diminishing  n(n)  is  not  clearly  helpful  because 
its  magnitude  is  not  clearly  related  to  the  “size”  of  the  set  fi(n).  Dasgupta  and  Huang  [14]  argue 
simply  that  «(n)  is  “a  bound  on  the  estimation  error,”  and  should  be  minimized.  Norton  and  Mo 
[16]  dispute  this  claim  writing  “[k(ti)]  is  not  a  bound  on  the  parameter  error,  nor  does  it  bear  a 
simple  relation  to  it.” 

In  this  section,  we  wish  to  determine  whether  D-H/OBE  is,  in  fact,  performing  according  to 
some  interpretable  principles.  To  begin,  let  us  use  Theorem  2  in  conjunction  with  (44)  and  (45)  to 
write  a  quadratic  for  the  optimal  root  at  time  n  for  D-H/OBE, 

F?'H(»)  =  a2  [7»(G(n)  -  l)2  -  k(»  -  1)(<7(»)  -  1)2+  ||  e(n,  G(n  -  1))  ||2  (G(»)  -  1)]  (46) 

+2s  [(7n  -  K(n  -  1  ))(G(n)~  1)+  ||  e(n,0(n  -  1))  ||2]  +  [7n  -  «(n  -  1)-  ||  e(n,G(n  -  1))  ||2] 

For  future  reference,  let  us  also  write  a  similar  expression  for  UOBE  with  /3n(An)  =  An  and  an(An)  = 

1.  This  latter  case  is  similar  to  the  SM-WRLS  setup,  except  that  n  minimization  is  used.  For  this 
reason  we  write  “ Ft sm-wrls (3)”: 

fsm-wrls{s)  =  lnG2{n)s2  +  2lnG(n)s  +  (7„-  ||  «(»,©(»  -  1))  ||2).  (47) 

The  reasons  for  including  (47)  will  become  apparent  momentarily.  Dasgupta  and  Huang  [14]  show 
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that  an  optimal  weight  in  the  sense  of  minimizing  «(n)  exists  iff6 


||  e(n,0(n  -  1))  ||2>  7„  -  «(n  -  1).  (48) 

Accordingly,  this  simple  and  computationally  inexpensive  (0(m))  test  may  be  employed  to  de¬ 
termine  whether  the  the  current  data  set  (y(n),x(n))  is  useful  in  the  sense  of  minimizing  k(ti). 
Interestingly,  the  test  (48)  is  tantamount  to  testing  the  zero  order  coefficient  of  quadratic 
for  negativity.  This  is  reminiscent  of  a  similar  test  which  can  be  performed  for  any  volume  or  trace 
algorithm  (see  remarks  below  Theorem  1).  However,  that  checking  of  the  zero  order  coefficient 
F®~H  should  be  a  sufficient  test  for  an  optimal  weight  is  not  apparent  as  it  is  in  the  volume  or 
trace  cases.  In  particular,  this  is  because  the  second  order  coefficient  of  Fj?~H  need  not  be  positive. 
Consequently,  Dasgupta  and  Huang  go  to  some  effort  to  verify  (48)  as  a  test,  and  a  set  of  rules 
centered  on  the  second  order  coefficient  is  presented  for  finding  the  optimal  weight  if  the  test  is 
met7.  Let  us  juxtapose  this  fact  with  the  following: 


Theorem  5  1.  Consider  the  SM-WRLS  algorithm.  If  n(n)  is  to  be  minimized  at  time  n,  a 

necessary  and  sufficient  test  for  the  existence  of  an  optimal  (K-minimizing)  weight,  say  A*^, 
is  that  the  zero  order  coefficient  of  (47)  be  negative: 

II  £(!.,©(»  -l))l|2>7n.  (49) 

2.  Again  consider  SM-WRLS.  Test  (49)  is  also  a  sufficient  condition  for  the  existence  of  an 
optimal  volume  (X ’  v)  or  trace  (X*t)  weight. 

3.  Item  2  is  true  for  any  volume  or  trace  minimizing  UOBE  algorithm. 

Sketch  of  Proof:  Item  1  is  proven  in  [3] ,[17].  (One  key  feature  of  F^M-wrls  facilitates 

this  result  is  that  its  second  order  coefficient  is  always  positive.  This  is  not  true  of  Fj?~" .)  Item  3 
follows  the  fact  that,  if  (49)  holds,  then  a0  of  FV1  and  bo  of  Ft  (see  Theorem  1)  are  both  negative. 
Now  see  the  remarks  under  Theorem  1.  Item  2  is  a  special  case  of  3.  □ 

The  point  of  including  Theorem  5  is  to  illustrate  one  case  (SM-WRLS)  in  which  k  minimization 
has  many  implications  for  intepretable  performance.  Indeed,  (49)  is  a  very  powerful  test.  It  is 
an  indicator  that  not  only  n(n),  but  also  either  of  the  other  two  (interpretable)  measures  can  be 
minimized  at  time  n  for  SM-WRLS.  Further  note  that,  due  to  Theorem  3,  k  minimization  implies 

®Their  work  is  carried  out  for  the  one-dimensional  case. 

'Actually,  a  simpler  rule  is  available.  Because  of  the  weighting  strategy,  „  must  be  in  the  interval  (0,1).  A 
little  thought  will  indicate  that,  once  (48)  is  met,  Ff>~H(s)  =  0  can  only  have  a  root  on  (0,1)  if  1)  >  0.  If 

this  is  the  case,  then  the  quadratic  equation  can  be  used  to  find  the  root.  Otherwise  K  is  taken  to  be  zero  or  some 
predetermined  number  on  (0,  1),  depending  on  the  relative  magnitudes  of  f®-H(0)  and  Ff>~H(  1) 
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likely  volume  decrease  (though  not  optimally)  as  well.  Coincidently,  (49)  can  also  be  (suboptimally, 
since  it  is  only  a  sufficient  condition)  used  to  test  whether  volume  or  trace  can  be  minimized  at  time 
n  for  any  UOBE  algorithm8.  We  must  not  lose  site  of  the  very  important  fact  that  convergence  has 
only  been  proven  for  the  D-H/OBE,  and  none  of  these  findings  does  anything  to  change  that  fact. 

Obviously,  the  next  step  is  to  inquire  whether  the  D-H  test  (48)  has  similar  implications  for 
interpretable  measures.  Unfortunately,  the  answer  appears  to  be  no.  Whereas  (49)  is  equivalent  to 
testing  whether  oo  +  Kv  <  0  or  6o  +  Kt  <  0  with  both  Kv  and  Kt  positive,  (48)  is  only  equivalent 
to  testing  whether  ao  +  Kv  <  0  or  bo  +  Kt  <  0,  where  neither  Kv  nor  Kt  is  necessarily  positive. 
Unfortunately,  the  truth  of  (48)  is  therefore  not  sufficient  to  assure  that  ao  and  &o  are  negative.  If 
it  were  additionally  known  that 

G(n)  >  mk,  (50) 

then  (48)  would  be  a  sufficient  condition  for  the  existence  of  A*  and  A*  t  in  the  D-H  case,  and  an 
indicator  that  any  weight,  even  A*  K,  would  likely  diminish  the  volume.  Because  of  the  weighting 
strategy  used  in  D-H/OBE,  however,  (50)  does  not  hold  in  general.  While  some  heuristic  arguments 
can  be  made  indicating  circumstances  under  which  (50)  might  be  true,  support  for  the  notion  that 
the  D-H  test  might  be  similar  to  a  volume  or  trace  test  is  very  weak  in  these  terms. 

So,  in  the  analysis  above  at  least,  the  D-H  test  comes  intriguingly  close  to  being  a  check  for  the 
existence  of  A*  u  or  A*  t  (hence  for  an  indicator  that  D-H/OBE  is  minimizing  both  k  and  volume), 
but  falls  somewhat  short.  Again,  we  have  not  been  able  to  find  that  convergence  and  intepretability 
exist  in  a  single  UOBE  algorithm.  However,  the  connections  that  apparently  exist  between  D- 
H/OBE  and  more  interpretable  algorithms  offer  some  hope  that  a  meaningful  interpretation  of  the 
dynamics  of  D-H/OBE  might  ultimately  be  found. 

5  Summary  and  Conclusions 

We  have  shown  that  all  existing  OBE,  and,  in  fact,  a  very  broad  class  of  OBE  algorithms,  can 
be  unified  into  a  single  framework  which  we  have  called  the  UOBE  algorithm.  This  framework  is 
based  on  generalized  WRLS  in  which  very  wide  classes  of  “forgetting  factors”  and  data  weights 
may  be  employed.  Different  instances  of  UOBE  are  distiguished  by  their  weighting  policies  and  the 
criteria  used  to  determine  their  optimal  values. 

With  the  UOBE  as  a  framework  for  discussion,  we  then  turned  our  attention  to  existing  algo¬ 
rithms.  The  main  advantage  of  those  which  minimize  ellipsoid  volume  and  trace  is  the  ease  with 
which  the  performance  principles  are  interpreted.  However,  to  date  no  volume  or  trace  algorithm 

*This  idea  has  been  employed  in  [17]— [19]  as  an  efficient  way  to  implement  the  testing  for  real-time  applications. 
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has  been  formally  shown  to  converge  in  set-theoretic  terms.  We  have  however,  introduced  a  new 
algorithm,  SM-SA,  for  which  conditions  may  be  stated  for  convergence  of  the  estimator.  Several 
results  are  presented  which  offer  promise  that  a  proof  of  set  convergence  for  volume  and  trace 
algorithms  will  ultimately  be  found. 

Interestingly,  SM-SA  uses  an  equivalent  weighting  strategy  to  D-H/OBE,  the  only  published 
UOBE  algorithm  for  which  set  convergence  and  cessation  of  updating  has  been  proven.  D-H/OBE, 
however,  uses  «  minimization  which  does  not  lend  itself  well  to  interpretation  of  algorithm  perfor¬ 
mance.  An  inquiry  into  the  interpretability  of  D-H/OBE  yielded  some  interesting  connections  of 
this  method  to  volume  and  trace  algorithms,  but  fell  short  of  showing  that  D-H/OBE  in  fact  mini¬ 
mizes  something  meaningful  at  each  step.  It  was  discovered  that  k  minimization  can  imply  volume 
or  trace  minimization,  but  this  was  not  demonstrated  for  any  converging  (ellipsoid)  algorithm. 

Hence  the  pursuit  of  an  interpretable,  set  converging  UOBE  algorithm  remains  an  open  issue. 
The  UOBE  framework  developed  in  this  paper  should  be  an  asset  in  the  discovery  of  this  desirable 
algorithm. 
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Figure  and  Table  List 


Fig.  1:  General  steps  of  the  UOBE  algorithm. 
Table  1:  Specification  of  existing  UOBE  algorithms. 
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At  time  n, 


1.  In  conjunction  with  the  incoming  data  set  ( y(n ),  x(n)),  find  optimal  values  of  an  and/or  /3„, 
say  a £  and/or  /?*.  Optimality  criteria  are  described  in  the  text; 

2.  If  optimal  positive  (and  sometimes  further  constrained)  values  a*  and/or  /3*  do  not  exist, 
then  discard  the  data  set  (set  /?*  =  0  and  /  or  q*  =  1); 

3.  Update  C(n),  @(n),  and  «(n)  using  (6),  (7),  and  a  recursion  for  k(-)  described  in  Lemma  1. 


Figure  1:  General  steps  of  the  UOBE  algorithm. 


Table  1:  Specification  of  Existing  UOBE  Algorithms 


Algorithm 

«n(A;) 

0n(K) 

Optimization 

F-H/OBE 

l/«(n  -  I) 

Khn 

Hv(n)  or  nt(n) 

SM-WRLS 

1 

K 

(iv(n)  or  nt(n) 

Dual  SM-WRLS 

A; 

l 

Hv(n)  or  nt(n) 

D-H/OBE 

i-a; 

a; 

K(n) 

SM-SA 

a;/(a;_i  +  a;) 

Hv(n)  or  nt(n) 
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I.  INTRODUCTION 

The  two  most  popular  approaches  to  filtering,  identification,  prediction,  estimation,  etc.,  are  the  equation 
errot  method  and  the  output  error  method.  The  algorithms  which  are  based  on  these  two  methods  are  often 
designed  to  minimize  the  mean  square  error.  Specifically,  they  minimize  the  mean  square  equation  error 
(MSEE)  or  the  mean  square  output  error  (MSOE).  The  asymptotic  analysis  of  these  algorithms  involves:  1) 
finding  the  attracting  solution(s),  and  2)  investigating  the  local  or  global  convergence  to  the  solution(s). 

Goodwin  and  Sin  [1],  and  Ljung  and  Soderstrom  [2]  have  treated  the  equation  error  based  algorithms 
thoroughly  from  the  perspective  of  convergence  and  applications.  An  attractive  feature  of  the  equation  error  is 
its  unique  minimum  MSEE  solution  regardless  of  the  linear  model  and  the  properties  of  the  input  3  However, 
this  property  is  not  shared  with  the  output  error  method  in  general  when  the  model  is  an  infinite  impulse 
response  (IIR)  filter.  But  there  are  sufficient  conditions  which  guarantee  the  uniqueness  of  the  minimum  MSOE 
solution  in  the  identification  setting  [3],  where  the  model  (adaptive  filter)  can  characterize  the  plant  (unknown) 
completely. 

The  goal  of  this  paper  is  to  present  a  weaker  sufficient  condition  than  what  was  presented  in  [3]  when  the 
input  is  white  noise  and  the  model  order  exactly  matches  the  order  of  the  plant  This  is  ultimately  intended  to 
take  us  a  step  closer  to  establishing  the  necessary  and  sufficient  conditions  for  the  unimodality  of  the  MSOE  sur¬ 
face.  In  this  paper,  first  a  cross-correlation  matrix  is  introduced  in  section  II  where  some  of  its  properties  are 
outlined.  In  section  III,  these  properties  are  used  to  extract  the  weaker  sufficient  condition  for  the  uniqueness  of 
the  IIR  identifier  which  would  minimize  the  MSOE. 


II.  A  CROSS-CORRELATION  MATRIX 

Consider  the  m  xm  matrix  P  defined  by 

P(A  ,C  ,/n )  =  £^m(n)y^(n) 

where 


(1) 


*  The  input  is  assumed  to  be  persistently  exciting. 
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0m  (n )  = 


1 

A(q~l) 

1 

A  (q~l) 


x  (n) 


x  (n  -m  + 1 ) 


Vm(»)  = 


1 

C(q -1) 

1 

C  (<?-') 


Jc(n) 


x(n-m+l) 


Here,  it  is  assumed  that  A(q  l)  and  C (q  ')  are  both  N,h  order  stable  polynomials  of  the  form 

A  (q~l)  =  1  +  4_i  =  n  d-ft*?-1) 

'*l  (2) 

c(<7_i) = i  +  Xc<  (?■'  =  n 
1=1  1=1 

with  |  pi  |  <  1  and  |  r,  \  <  1,  for  i  =  1,  •  •  •  JV . 

Ljung  and  Soderstrom  [21  encountered  the  matrix  P  during  the  convergence  analysis  of  the  nonsymmetric 
instrumental  variable  method  (IVM).  They  argued  that  P  is  singular  only  on  a  measure  zero  set  which  is  deter¬ 
mined  by  del  (P1=0.  As  a  result,  it  was  concluded  that  the  nonsymmetric  IVM  converges  almost  everywhere  and 
that  P  is  genetically  nonsingular.  They  also  provided  the  sufficient  condition  for  nonsingularity  of  P  in  [2, 
Lemma  4.7],  But,  since  we  are  only  interested  in  the  case  where  x(n)  is  white,  let  us  restate  this  Lemma. 

Lemma  l:  12 / 

Assume  that  x(n )  is  a  white  sequence.  Then,  the  matrix  P  is  nonsingular  if  either 
...  A(q~l)  . 

(i)  - *— r-  is  strictly  positive  real, 

C  (q~l) 

or 

(ii)  m  >  N 

The  positive  realness  in  (i)  is  such  a  strong  condition  that  it  also  guarantees  P  to  be  positive  definite.  An 
obvious  special  case  is  when  C(q~x)  =  A(q~l).  The  sufficient  condition  stated  in  (ii)  is  much  less  restrictive 
than  (i)  which  we  will  further  explore  next.  In  particular,  it  is  of  considerable  interest  to  know  how  tight  this 
sufficient  condition  is.  That  is,  can  m  >  N-\,  or  m  >  N-2,  etc.,  replace  (ii)?  If  so,  a  weaker  sufficient  condi¬ 
tion  has  been  identified. 


The  Toeplitz  matrix  P  for  which  no  symmetry  assumption  is  assumed  is  fully  determined  by 
gk  ,  1  -m  <  k  <  m  - 1 ,  where 
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or,  equivalently. 


gj-i  (A  ,C )  =  P,j  (A  ,C  ,m ) 


i,j  =  1,2,  •  •  •  ,m 


J_6  1  1 

2nj  J|;|=i  ;‘vA(;'!)  C(z) 


Qx(z)zv,-1»h 


dz 


(3) 


P(A  ,C,x,m  )T  =  J  P(A  ,C yt ,m )  J 
=  P(C  A  *x ,m) 

where  <b,(z)  is  the  spectral  density  of  x(n )  and 


(4) 


0  1 

1  0 


The  following  Lemma  presents  an  identity  involving  the  NxN  matrix  P.  In  the  sequel,  Pm  will  be  occasionally 
used  to  denote  P(A  ,C  „r  ,m )  for  brevity. 


Lemma  2: 

If  xin)  is  a  zero-mean  white  noise  sequence  with  variance  a2,  then 


■ 

T 

- 

i 

i 

c r  I  0r 

1  I 

-a  | _ 

p* 

= 

l 

1  0T 

1  0T 

0  |  o 

(5) 


where  a=[at  a2  '  '  ‘  aN  ]r,  c  =  [C[  c2  •  •  •  cN  ]r.  Also,  0  is  the  N- 1  zero  vector,  and  O  and  I  are  the 
N-l  x/V-1  zero  and  identity  matrices,  respectively. 

proof:  See  the  Appendix  □ 


One  immediate  consequence  of  this  result  is  stated  below  in  Lemma  3  and  Theorem  1  which  establish  the  non¬ 
singularity  of  P,V-i  by  identifying  a  direct  relationship  between  det  (Pw)  and  det( PjV_1). 


Lemma  3: 

Let  us  define 

Then,  we  have 
(i)  =  — 


Am  =  det 


P(A  ,Cqc,m) 


n  n  -  pap 
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(6) 

.....  .  1  \a  —  ^iV  )  ''.V-l) 

(•0  A/y_i  -  — Y  (1  “  Qncn)  &N  ~  — u - 

°  n 

*7=1 

proo/:  See  the  Appendix.  □ 

Theorem  l: 

The  matrix  P  is  nonsingular  for  m  >  N-\. 

Proof. 

Since  |a^  |<1  and  |  cN  |<1  for  the  stable  polynomials  A(z_l)  and  C(z_1),  according  to  Lemma  3(ii)  Aw_, 
would  be  nonzero.  Since  P  is  nonsingular  for  m  >  N  according  to  Lemma  1,  part  (ii),  the  result  immediately 
follows.  □ 

Theorem  1  presents  a  weaker  sufficient  condition  for  nonsingularity  of  P,  namely,  m  >  N- 1.  But,  is  this 
the  weakest  sufficient  condition?  To  answer  this  question,  consider  the  following  example. 

Example  I: 

Let  N  =  3  and  m  =  1  which  signifies  the  case  m  -  N-2.  Then,  P  is  a  scalar  which  is  given  by 

n  . .  (1-ajCj)2  -  (atc3-c2)(cla3-a2) 

P  =  g0(A,C)  = - ^ - 

n  n -pin) 

<.J  =  I 

However,  if  c  [=-2.4,  c2=1.91,  c3=-0.504,  a ,  =<3 2=0,  and  a3=0.2854  ,  which  correspond  to  two  stable  polynomi¬ 
als  A(q~[)  and  C(q~l),  result  in  P  =  0.  □ 

Similar  examples  can  be  found  for  the  case  where  m  <  N-2.  Therefore,  it  is  concluded  that  m  >  N- 1 
represents  the  weakest  sufficient  condition. 

One  significant  implication  of  the  above  result  is  presented  next  where  the  connection  between  P  and  the 
stationary  points  of  MSOE  is  established  and  then  a  weaker  condition  for  the  uniqueness  is  stated. 
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ni.  STATIONARY  POINTS  OF  MSOE 


Consider  the  system  identification  model  where  it  is  assumed  that 


v  (n )  =  ——^—7—x  (n )  +  v  (n ) 
C(q~') 


where 


(7) 


D(q  1 )  =  £4  <7  '  ,  C(q  l)  =  1  +  £c,  (?  ' 

1=0  ;=i 

are  coprime  polynomials  in  q~l  and  v(n)  is  additive  noise.  We  further  assume  that  the  zeros  of  C(z-1  )  are 
inside  the  unit  circle  and  that  the  additive  noise  v(/i)  is  a  zero  mean  stochastic  process  which  is  independent  of 
x(n).  Let  the  adaptive  system  be  an  IIR  filter  whose  input-output  relation  is  governed  by 


v(n)  =  x(n), 

A(q~l) 


where 


(8) 


B(q~l)  =  X b ,  q~‘ 

.=o 


A  (q  ’)  =  1  +  £aj  q  1 

i=  1 


If  we  define  the  output  error  by  e(n)  -  y(n)-y(n),  the  MSOE  is  given  by 


E[e2(n))  =  E 


D (g~‘)  _  B(q~l) 
C(q~l)  A(q-')  J 


x  (n) 


+  £[v2(n)]. 


The  stationary  points  of  (9)  are  the  solutions  of 


Dig -1)  _  B(g~l) 
C  (q~l)  A(q-') 


x(n) 


•  ^  x(n_0 


A2(<7-’) 


=  0, 


(9) 


(10) 


D(q~  )  _  B  (t?~  ) 
C(q~] )  A  (t?"1) 


x(n) 


1 


A  (q~  ) 


jrx(n-j) 


-0, 


(ID 


\<i<n„  ,  0<j<nb 

It  is  shown  in  (31  that  (10)  and  (1 1)  accept  a  unique  minimally  realizable  solution  if  for  white  input  x(a) 


n’  =  min(na-nc  ,  nb  -nd )  >  0 
nb  +  1  -  nc  >  0 


(12) 

(13) 
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The  expression  (12)  merely  states  that  the  adaptive  filter  should  be  of  sufficient  order.  Here,  we  consider  the 
case  where  na=nc  and  nb=nd  which  is  referred  to  as  exactly  matching  (EM)  adaptive  IIR  filter.  Violating  (13) 
may  lead  to  the  existence  of  local  minima  on  MSOE  surface  as  shown  in  [6],  [7],  Weakening  (13)  may  natur¬ 
ally  seem  to  be  in  contradiction  with  this  result.  However,  the  case  considered  in  [6]  and  [7]  is  for  nb  =  0, 
nc  =  3  and,  therefore,  nb  +  1  -  nc  =  -2.  Here,  we  show  that  nb  +  1  -  nc  =  -1,  for  EM  case,  is  the  underlying 
weakest  sufficient  condition  for  uniqueness  of  the  global  minimum. 


Degenerated  Solutions:  The  existence  of  stable  degenerated  solutions  is  a  sufficient  condition  for  the  existence 
of  local  minima  [5,6,7].  For  the  EM  case,  the  degenerated  solutions  corresponding  to  nb  +  2  =  nc  are  found  by 
setting  B(q~x)=  0  in  (10)  and  (11).  As  a  result,  (10)  vanishes  and  (11)  is  reduced  to 


P(A  ,C  ,«£,  +  !)• 


=  0 


(14) 


Corollary  2: 

For  a  given  stable  system  (7)  with  white  input,  no  stable  polynomial  A(q~x )  satisfies  (14),  and  hence  no 
degenerated  solution  exists. 

Proof: 

If  such  a  stable  solution  exists,  say  A’  ,  then  dn . d„L  can  be  solved  for.  The  matrix  P  has  to  be 

b 

singular  at  A'  since  not  all  d ,  are  zero.  But,  according  to  Theorem  1,  P(A,C,Jt,ni,+l)  is  nonsingular  for  any 
stable  A *  and  C  since  ^  +  1  =  nc-l  =  n3- 1.  □ 


Other  Solutions:  For  the  EM  case,  all  the  stationary,  nondegenerated  points  of  MSOE  corresponding  to 
nb+ 2  =  n,.  which  solve  (10)  and  (11)  fulfill  (see  [5]) 


P(AA,CAoc,na+nb  -nL  + 1 V 


=  0 


(15) 


" L 

where  for  L(q "')  =  l  +  £ /,<? (,  «f.>0) 

i=i 
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A{q'x)  =  A(q~l)Hq  ‘)  ,  B(q  l)  =  B (q~l)L(q  ') 

in  which  A (q~l)  and  B(q~l )  are  coprime.  Also,  h ,  are  such  that 

n 1  +/i£— 

£  h,  q-  =  A(q~l)D(q~l)  -  B(q'l)C(q-1)  (16) 

i=0 

Theorem  2: 

Consider  an  exactly  matching  adaptive  HR  filter  in  which  nb  +  2>  nc.  The  MSOE  surfuce  of  this  filter  is 
unimodal,  with  a  unique  global  minimum,  when  the  input  is  white. 

Proof: 

We  note  that 

N  =  deg(AA)  =  d<?£(CA)  =  2 na-nL  ,  m  =  na+nfc-nt+l 

and  since 

/V-m  =  na-nb- 1 

=  ^  1 

then.  Theorem  1  implies  the  nonsingularity  of  P(AA,CA  A,na+nfr-nt+l)  for  any  value  of  nL  .  Therefore,  (15) 
yields 

hi  =0  ,  i  =0.  ..  .  ,na+nb-nL  (17) 

Using  (17)  in  equation  (16)  reveals  that 

fl(g~')  =  p(^"')  (18) 

A  (<?-')  C(q~') 

Since  C(q~l)  and  D(q~')  are  coprime  po’ynomials,  (19)  implies  that 

A(q-')  =  C(q-')  ,  B(q~l)  =  D(q~') 

But  this  can  happen  only  when  nb  =  0.  Therefore,  there  exists  a  unique  stationary  point  which  is  a  unique  glo¬ 
bal  minimum  of  MSOE  and  is  giwen  by 

A(q~l)  =  C(q-')  ,  B(q~')  =  D(q~l)  (19) 


This  is  a  weaker  sufficient  condition  than  (13)  for  the  EM  filters.  □ 
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Example  2: 

Consider  the  case  where  nb  =  nd  =  0  and  na  =  nc  =  2  which  was  considered  by  Steams  [9j.  The  MSOE 
surfaces  of  filters  in  this  class  were  observed  to  be  unimodal  by  examining  different  pole  locations  of  the  unk¬ 
nown  system.  Theorem  2  provides  a  proof  in  support  of  this  observation.  □ 

IV.  CONCLUSION 

A  nonsymmetric  Toeplitz  matrix  was  introduced  and  a  weaker  sufficient  condition  for  its  invertibility  was 
presented.  This  obtained  weaker  condition  was  used  to  conclude  that  if  the  adaptive  IIR  filter  is  exactly  match¬ 
ing,  the  MSOE  surface  is  unimodal  if  (nb+  2)  -  nr  >0.  This  is  a  weaker  sufficient  condition  than  what  is 
reported  in  the  literature  [3],  In  fact,  this  can  be  regarded  as  the  weakest  sufficient  condition  in  general. 
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Proof  of  Lemma  2: 
First,  note  that 


a?  $N(n)  =  .t(n  +  D-  .  cT  y/s(n)  =  x(n  +  l)--Xin  +  ^ 
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since  x(n)  is  white.  Also. 
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Similarly, 
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Finally,  straightforward  calculations  suggest  that 
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Therefore, 
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where  the  right  side  of  the  equality  in  (A6)  follows  immediately  using  (A2),  (A3),  (A4),  and  (A5).  But 
this  an  alternative  representation  of  (5)  and  the  proof  is  complete.  □ 


Proof  of  Lemma  3: 

(i)  First,  let  p,  ’s  be  distinct.  Now  let 


A,  =  Residue  of 


A(z-‘)  C(z) 


at  z  =  Pi 


C  (p, )  fl  (Pi  -  Pj  > 

j=  i 


Then,  Equation  (3)  is  reduced  to 

gj-i  =  S  Pk~l~'  h  Pi 

k  =  1 

For  the  special  case  when  m  =  N ,  using  (A8)  in  (I)  gives 

P(A  ,C,r  JV)  =  V,  A  V, 

where 
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(A8) 

(A9) 


Therefore,  since  the  involved  matrices  in  (A9)  are  N  xN,  it  follows  that 
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This  result  was  derived  under  the  assumption  that  p,  's  are  distinct.  But,  since  the  determinant  is  a  con¬ 
tinuous  (actually  analytic)  function  of  the  elements  of  a  matrix,  and  since  each  element  of  the  matrix  P 
is  analytic  for  stable  polynomials  A(z~‘)  and  Cfz"1)  (see  [2])  then  Am  is  analytic  and  therefore  continu¬ 
ous.  It  then  follows  that  if  A(z_I)  has  multiple  poles,  &N  is  given  by  (A10). 


(ii)  Lemma  2  implies  that 
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(All) 


The  determinants  of  the  Companion  form  matrices  in  the  left  side  of  (All)  are  equal  to  (-lV^'a^  and 
(-D^-’c/v,  respectively.  As  a  result,  (All)  can  be  written  as 

8  o~G2  8 1  ‘  *  *  8n- i 

8-i  So  ' 
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8i-n  •  ’  £-t  8o 


(A12) 
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where  the  last  equality  follows  by  evaluating  the  determinant  with  respect  to  the  first  row  or  the  first 
column.  Therefore, 


An-i  =  — t  (1  -  as cN)  A;v 
(T 


(A13) 


and  the  proof  is  complete.  □ 
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A  class  of  algorithms  is  presented  for  training  nonlinear  feedforward  neural  net¬ 
works  using  purely  “linear’’  techniques.  The  algorithms  are  based  upon  linearizations 
of  the  netv/ork  using  error  surface  analysis,  followed  by  a  contemporary  recursive 
least  squares  identification  procedure  which  can  be  implemented  using  parallel  pro¬ 
cessing.  Specific  algorithms  are  presented  to  estimate  weights  node-wise,  layer-wise, 
and  for  estimating  the  entire  set  of  network  weights  simultaneously.  A  procedure 
for  modifying  the  algorithms  to  selectively  use  the  training  data  and  increase  speed 
is  also  presented.  A  computationally  inexpensive  measure  is  developed  with  which 
to  assess  the  effect  of  a  particular  training  pattern  on  the  weight  estimates  prior  to 
its  inclusion  in  any  iteration.  Data  which  do  not  significantly  change  the  weights 
are  not  used  in  that  iteration,  obviating  the  computational  expense  of  updating. 
Several  experimental  studies  are  presented  showing  the  advantages  of  this  class  of 
algorithms.  Specifically,  the  layer-wise  algorithm  is  shown  to  be  vastly  superior  to 
back- propagation  in  terms  of  the  number  of  convergences  and  convergence  rate.  Ad¬ 
ditionally  this  algorithm  is  shown  to  be  insensitive  to  the  choice  of  initial  weights  and 
forgetting  factor,  eliminating  two  of  the  greatest  problems  in  the  implementation  of 
existing  training  algorithms. 
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This  research  is  concerned  with  a  particular  class  of  optimal  bounding  ellipsoid 
(OBE)  algorithms  which  implements  an  optimization  criterion  based  on  the  volume 
of  the  optimal  ellipsoid.  The  OBE  algorithms  belong  to  set  membership  (SM)  identi¬ 
fication  techniques  and  are  used  to  identify  the  parameters  of  linear  system  or  signal 
models  based  on  a  priori  information  about  the  pointwise  ‘‘energy  bounds”  on  the 
error  sequence.  OBE  algorithms  define  a  set  of  solutions  that  takes  the  form  of  a  “hy- 
perellipsoid”  in  the  parameter  space.  This  ellipsoid  is  centered  around  the  familiar 
WRLS  estimate. 

In  this  work,  the  convergence  behavior  of  the  ellipsoid  for  the  class  of  OBE  al¬ 
gorithms  that  utilizes  the  volume  ratio  measure  is  studied  under  different  types  of 
disturbances.  The  non-persistency  in  the  excitation  of  the  disturbances  may  result 
in  the  degeneration  of  the  ellipsoid.  The  convergence  of  the  ellipsoid  under  both 
persistently  and  non-persistently  exciting  colored  noise  is  particularly  investigated. 

The  conventional  OBE  algorithms  with  volume  ratio  measure  employ  a  data  selec¬ 
tion  strategy  which  is  based  on  minimizing  the  volume  of  the  ellipsoid  and  finding  an 
optimal  error  minimization  weight  to  be  associated  with  the  present  datum.  In  this 
work,  a  new  OBE  algorithm,  the  set  membership  past  weight  optimization  (SM-PWO), 
is  developed.  The  data  selection  technique  in  this  algorithm  is  based  on  minimizing 


